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Learning the Ambiguity Surface 

Sofia C. Olhede 



Abstract 

This paper introduces tlie class of ambiguity sparse processes, containing subsets of popular nonstationary time series such as 
locally stationary, cyclostationary and uniformly modulated processes. The class also contains aggregations of the aforementioned 
processes. Ambiguity sparse processes are defined for a fixed sampling regime, in terms of a given number of sample points 
and a fixed sampling period. The framework naturally allows us to treat heterogeneously nonstationary processes, and to develop 
methodology for processes that have growing but controlled complexity with increasing sample sizes and shrinking sampUng 
periods. 

Expressions for the moments of the sample ambiguity function are derived for ambiguity sparse processes. These properties 
inspire an Empirical Bayes shrinkage estimation procedure. The representation of the covariance structure of the process in terms of 
a time-frequency representation is separated from the estimation of these second order properties. The estimated ambiguity function 
is converted into an estimate of the time-varying moments of the process, and from these moments, any bilinear representation 
can be calculated with reduced estimation risk. Any of these representations can be used to understand the time-varying spectral 
content of the signal. The choice of representation is discussed. Parameters of the shrinkage procedure quantify the performance 
of the proposed estimation. 

Keywords: Ambigtiity function, harmonizable process, nonstationary time series, time-varying spectrum. 

I. Introduction 

Assumptions on stationarity dominate time series analysis, and are a necessary requirement for most of traditional time 
series methodology to work. Unfortunately many practical inference problems violate these assumptions. For this reason the 
last fifty years of time series analysis have seen considerable developments in generalizing methodology to encompass processes 
violating stationarity constraints, see for example the class of semi-stationary processes introduced by [1], the class of locally 
stationary processes due to [2], methods of estimating local spectral summaries in a nonparametric framework introduced by 
[3], [4] and [5]. Time domain notions of local stationarity are also discussed by e.g. [6] etc; and we mention wavelet based 
models, see e.g. [7]. These developments have significantly advanced the tool-kit available for data analysis and modelling, 
but when using them we are still strongly limited in terms of permitted ways of modelling the covariance of a process so 
that it can be estimated stably. Also while within a single framework there is a representation of the covariance function as a 
"time-varying spectrum," this is most usually poorly defined between model classes and does not yield a model independent 
definition. 

In tandem to these developments in statistics, in signal processing the theory of biUnear representations of covariance 
structures has reached maturity, see e.g. Flandrin [8, Ch. 3] or [9]. Modem understanding of time-varying spectral representations 
is that no particular family of representation is uniformly optimal across different model-classes in terms of representing 
structure, and there has been less focus on producing estimators. For example two broad classes of nonstationary time series 
are those of Harmonizable processes [10] and Karhunen processes [11]; however the practical utility of these classes is limited, 
as they have no general inference methods associated with them. Of particular note in modern development is the class of 
underspread processes introduced by Matz, Hlawatsch and Kozek (see e.g. [12], [13]), with associated inference methods, e.g. 
[14]. Underspread processes are those which have been sampled sufficiently rapidly to be considered smoothly varying with the 
given sampling, and these are strongly linked to the class of semi- stationary processes, that are constrained in terms of properties 
of a Fourier transform of the local spectrum called the Loeve spectrum, see e.g. [8]. All of the existing inference methods for 
these types of processes are based on a notion of local smoothing, and removing medium and high frequency structure. This 
violates how modern data is normally acquired - in contrast to an underspread process we expect the complexity of the data 
to grow as we increase the sampHng size N and decrease the sampHng period At, even if we do expect some imderlying 
simplicity in the representation of this process. To model this kind of process we introduce a new class of nonstationary 
processes called ambiguity sparse processes, whose Ambiguity Functions (AF) are mainly supported over a set of regions. The 
AF is a Fourier Transform of the autocovariance sequence transforming in the global time index rather than in the time-lag. 
Processes that are ambiguity sparse, if sampled more finely than the available data, are also underspread; however we expect 
data to grow in complexity with the sampling. 

After defining the AF from the time-varying covariance of the process, we derive the statistical properties of its sample 
version for an ambiguity sparse process, which inspires an Empirical Bayes' method of estimation. We discuss the properties 
of this estimation method, and describe how the estimate of the AF can be converted into an estimate of the basic object 
of the representation of a nonstationary process, namely the covariance sequence of the observed process. This covariance 
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sequence in turn can be represented in terms of any local spectral estimator in the bilinear class, see e.g. Flandrin [8, Ch. 3]. 
We discuss the interpretation of our estimators, as "smoothing" the sample covariance sequence adaptively. The performance of 
the estimation can be quantified in terms of parameters in the Empirical Bayes procedure. An additional benefit of ambiguity 
sparse processes is that it is very easy to characterise when sums of the processes, or indeed linear filtered versions of such 
processes, remain in the class of ambiguity sparse processes. 

We illustrate the properties of the proposed estimation procedures on a number of simulated and real data examples. We 
discuss different choices of time-varying spectral representation, and two different methods of regularizing the estimation 
procedure to ensure that the estimated autocovariance sequence is positive definite, and thus corresponds to a valid estimator 
of covariance. 

II. The Four Faces of Time-Frequency 

A. Time-Time Representation 

We assume a single real- valued time series {Xn] is under analysis. This series is assumed to be mean-zero, or it is assumed 
that the true mean has already been estimated and removed. Furthermore we assume {X„} is a Gaussian process. Because we 
have made few assumptions on the smoothness or structure of the covariance sequence of {X„}, many distributions could be 
well approximated by this mixture of Gaussians. {X„} are the samples of a continuous time process {X{t)} at time points 
tn = nAt, with n = 0, . . . , iV — 1, and At > is the sampling period. Because is a Gaussian process and we know 

the first moment is zero to make inferences we need to model the second order structure or 

Xr(i„) -E{X„X„_,}, T = -iV + n + l,...n, n = 0,...,iV-l. (1) 

Writing down this definition is trivial, however finding processes for which methods can be designed such that {A^r (^n)} can be 
estimated consistently from the available data is a substantially harder problem. If a process is stationary then A^^(t„) = M.T- 
for all tn and additionally the sequence {A^t} has elements that are finite for all t, where the elements can be estimated 
stably either using method of moments, or a parametric approximation to the full covariance, such as using a truncated version 
of the Wold decomposition, e.g. a moving average model. We refer to i„ G M, with n e Z as the global time index, and r e Z 
as the local time index, or time-lag. 

An important modelling tool for stationary time series is the spectral density function (sdf) defined when the integrated 
spectrum is absolutely continuous [15] if {Xn = X{tn)} has been sampled sufficiently finely in time by [16, p. 196], 

m=At £ A^.e— for-^</<^, (2) 

r — — oo 

and substantial efforts have also gone into defining transforms of {A^r(^n)}T ^o an interpretable time-varying spectral 
representation, see e.g. [1], [17], [9, Ch. 6] or [8, Ch. 2.3, 2.4, 3.1, 3.3]. Issues arise on two different counts; firstly it is 
very hard to define a suitable time-varying spectral density definition from a known and given deterministic A^T-(t„), secondly 
designing estimation methods with suitable properties of this object is even harder (see some early discussion in [18]). 

It is well known that defining time-varying spectra from the covariance structure of real-valued signals is problematic. 
Interference from negative frequencies can cause non-interpretable terms to appear in any bilinear representation [19]. Thus 
to improve the mathematical properties of representations of the second order structure it is common practice to calculate the 
representation of the analytic signal of {X„}. We define the analytic signal by (with the assumption that At has been sampled 
finely to make out the important characteristics of the series) 

/■oo 

Zn = 2 dZx{f)e^"^'- df = Xn+ iYn, Yn^H {Xn} , (3) 
Jo 

where y„ is defined by this equation, and H{-} is the Hilbert transform. For a finite sample with fixed At and N we 
approximate H{-} using the discrete Hilbert transform. {Zn}, or the analytic signal, can exhibit no interference between 
negative and positive frequencies. 

It may be the case that the positive frequencies are still correlated with their conjugate, see [20], and to complete our second 
order description we do not only need to model and estimate the autocovariance of {Zn} defined by (recalling E{Z„} = 0) 

Mr{tn)=E{ZnZ:^_^}, (4) 

but also we need to model Rr{tn) = E {ZnZn-r} (the so-called relation sequence [20, p. 55]). Discussing improper stochastic 
processes {e.g. those processes for which Rr(tn) ^ 0) is outside the scope of this paper and we note in passing that the methods 
we shall propose for the covariance sequence can also be straightforwardly extended to estimating the relation sequence. We 
do have to pay a price for analyzing {Zn} rather than {X„}: {F„} as defined in ([3]l is in general a "smeared out" version of 

{Xn}, and so M.r{tn) ^ Mr{tn). 
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B. Time-Frequency Representation 

A fundamental problem with defining a time-varying spectrum from {Mr{tn)} is instability [12], We could in theory shift 
the sequence in time so that we estimate Mi"\tn) = Mr i , and it would be equally reasonable to call this 

object the "local time-moment" of the process. [^Furthermore in general Mi°'^\tn) 7^ Mr"^''(i„) for ai 7^ a2- We could also 
subsequently calculate Fourier transforms in r for any given choice of a e [—5, ^] , see [12], to describe the time-varying 
spectral content of the process, with the global time shifted to = (n + — a) r] At. We could also define a local spectral 
representation by any shift a centering the moments in time 

CO 

S"it„J) = At Af,(t^)e-2"/-^*, (5) 

r — — 00 

where / is referred to as the global frequency. Each choice of a de facto leads to a different choice of a time-varying spectral 
representation. The most popular choice is a = [9]. We shall focus on estimating 5°(t„,/), from smoothing an empirical 
estimator of Mrit), but also will compare this representation with other choices, such as a = 1/2, corresponding to the 
Rihaczek distribution [21]. A larger class of time-frequency distributions is formed by additionally allowing a convolution of 
S^{tmf) in time and frequency, see [9]. We define the Short- Time Fourier Transform (STFT) as (for some given window 
function h(-)\ by J^''^\t, /) = VAi h{s - t)Zse-'^'''f'^K In statistics inference is often based on the STFT {e.g. 

[4]), and an important characteristic is its variance: 



)\ J(^''*)(t,/)| l=A< mr{ti~t)Mr{ti)e-^'''f^^\ ZUr{t)=h{t)h{t-T). (6) 

^ ii— — oor— — 00 

The expected magnitude of the STFT can also be represented as a FT of the moments of {Zt}. In general the 

window functions h{-) can be replaced by an arbitrary nonseparable kernel ■ujr{t) for greater flexibility, catering to variable 
smoothness of Mr{t), thus allowing for better frequency resolution when such is possible. The function Mr{t) has a different 
degree of smoothness in r and t and so using Eqn. J6| to threshold adaptively is better than the non-adaptive choice of 
vjr{t) = hlt)h{t — t). The variance of \j'^^''^\t, f)\ is large and additional smoothing is required to make this a good 
estimator. Clearly whatever is done subsequently to calculating | /)| depends strongly on the choice of the function 

MO- 



C. Frequency-Time Representation 

We have defined a Fourier transform of AIt (tn) as the time- varying spectrum. The Fourier transform of the moment sequence 
in global time t instead of in lag r defines the function 

00 

Ar{>^)^At Y Mr(t„) (7) 

n— — 00 

Ar{i') is the ambiguity function (AF) of see e.g. [8], [22]. A"{i/) can also be defined for any arbitrary choice of i", 

rather than centering the global time at i„, as in Eqn (|7]), however this will only have an effect as a phase-shift term, and 
will not change the support of the AF, as this depends only on the magnitude. The AF has been used to define underspread 
processes [13], e.g. those whose AFs are limited to a set of frequencies near the origin, or (j^, r) = (0,0). Very few useful 
processes are strictly underspread. For this reason Hlawatsch and Matz introduced nearly underspread processes which only 
require a suitable decay of the magnitude of the AF from the origin, e.g. that decays from the origin. Such processes 

will inevitably fit into existing theory in terms of inference, see e.g. [4], with the associated notion of asymptotics. 

For completeness we define the dual-frequency spectrum S{i^, /) as a FT of the AF S{iy, /) — At Ar{i')e'^'^"-f'^^'^ . 

As {Xn} is assumed harmonizable this quantity has an interpretation by using the spectral representation of {Zn}. We let the 
Eoeve spectrum (S'l(/i,/2)) be given by 

cov{dZ{fi), dZ{f2)} = >S'i,(/i, /2) cZ/id/2. We find that the dual-frequency spectrum is a shifted version of the Eoeve spectrum 
as S{i', /) = SL{f -\- V, /). The point of this shift is to align the stationary spectrum to concur structurally with S'(0, /). We 
can now describe the second order structure of the nonstationary process by any of these four different functions, namely 
the time- varying autocovariance Mr{tn), the dual-frequency spectrum S{v,f), the time-varying spectrum S{t, f) or the AF 
Ar{i'). Because of the uniqueness of Fourier transforms one might argue that any of these descriptions is equally useful, 
important and indeed complete. 

We shall introduce a new class of processes which if they did not grow in complexity with N and At and if you observed 
them over sufficiently large time-windows and at sufficiently small At would be underspread. As we collect more data, we 



'There is a problem in calculating this object if (it | — a) r ^ Z, that can either be solved using interpolation or by only using the moments corresponding 
to integer valued times. 
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generally observe processes of growing complexity - hence we do wish to define processes whose complexity nominally may 
increase with decreasing At and increasing N. We shall estimate their second order properties for a given (small) At and (large) 
N. We never anticipate to have all the energy of the process living at very low values of h', and we wish to reduce variance 
by instead zeroing out large (not necessarily connected) regions of the ambiguity plane. We therefore prefer to introduce these 
processes for a fixed sampling scheme as centred at a few time-frequency points and exhibit decay away from these points. 

Definition 1: Ambiguity Sparse Process 
A second order real-valued time series {X„ — X{nAt)} is denoted Ambiguity Sparse at sampling N,At if its AF can be 
represented for i^T S N in the form 



fe=i 



(8) 



with B^^\v,u) a smooth function near {v''f^\ rj^^ ) , taking a non-zero value at this point, 5^^^ > 0. 

To cope with more anisotropic structure, such as chirping components, Eqn ([H) can be rotated and dilated to convert the 
circular contours of ([Sj into elliptical contours with a given orientation, and all (subsequent) proofs would change mutatis 

mutandis in the rotated frame. If we define yk = diag ^a^*^-* 12'"'') ^ e (^At (^i^ — i^Q*^"*^ (^t — Tq'^-'^ /N^ , with R e = 
{(cos(0) — sin(6'), (sin(6') cos(6'))} this would correspond to a model of 

AWH«S(^)(i.,r/iV)[yi^y,] (9) 

Such a model would naturally treat stochastic chirping signals, see for example [23]. This allows nonstationary structure not 
necessarily aligned with the usual time-frequency axes. 

Remark 1: The Sum of Two Ambiguity Sparse Processes 
The sum of two ambiguity sparse processes at a fixed sampling is ambiguity sparse, as long as all the spreading kernels 
{'i'ni'^)} are limited in support. The time-varying sequence {i/)„(<)} (and their Fourier transform {5'„(z^)}) is defined so that 

00 00 

y„=At i'Uin)Xr,-h + Y^, At ^Pl{tr^)Mr^hit^h)=COY{Y„,X„-r}=Ml^■''\t„), (10) 

h— — oc h— — oo 

SO that GOV \ Y^^ , Xt-h} — for all /i e Z. Then with Ai^\i^) as the AF of a single process {Xt} and A'^'^\iy) as the FT 
of Mr^'^\tn) in tn (the cross-Ambiguity Function) we have 

= At ^ / vl;*(_(^' + ^.))A._^(j.')e2-'"''d:.'. (12) 

We therefore get a countably infinite number of kernels {5'„(z^)} spreading when constructing A'^'^^v) (and similarly 

Ai''^'^\i').) If the support of each of {'^hi^)} is limited in and h then A''t^'^\i^) and Ai''^'^\i^) are still sparse. Ai'^\iy) 
is sparse by assumption; so is A)- (v). Compare with the theory for semi-stationary processes, see [24], [25], which is 
complicated because a single oscillatory family must be used for both the processes {Xt} and {Yf} and it is hard to untangle 
potential correlation between processes. 

Remark 2: The notion of ambiguity sparse can directly be related for models of the smoothness of A/^(<). Assume that 
AIr{t), uniformly in r, is Sobolev order k. This implies that |^T-(t^)| < for sufficiently large v and positive constant 

£/. This class fits into that of ambiguity sparse processes, but also into the class of underspread processes. 

III. Method of Moments Estimation 
We define a naive estimate of the analytic autocovariance function by 

It is natural to start to define a time-varying spectrum starting from Af^(<„). We may form different time-varying spectral 
estimators by filtering any such choice of time-varying spectrum that is calculated from MT-(t„). Af^(t„) is unbiased when 
estimating the autocovariance of the analytic signal. Unfortunately its variance is very large. For example as {X„} is a Gaussian 
process then by Isserlis' theorem (see [26]) we can note that 



■{Mr(t„)} 



MQ{tn)Mo{tn-r) + Rr{tn)R-r{tn-r) ■ (14) 



UNIVERSITY COLLEGE RESEARCH REPORT 310 



5 



With our assumption of propriety the second term in Eqn ([T4]i is identically zero, but despite this, the estimator is to all intents 
and purposes useless, as it is far too variable. The usual approach to improving a variable but unbiased estimator is via some 
form of smoothing. We define the smoothed estimator (for some chosen kernel function uJT{t)) as 



M,(i„) = At 



(15) 



Ideally ujr{tn) would be a variable bandwidth kernel that is naturally adapted to the smoothness of Mrit) in t for each given 
value of T (cf Eqn (|6]l). Some examples of possible kernels for examples given in this paper are shown in Figure [T] These 
reinforce different structure, such as implementing local smoothing but keeping dependence across all lags (subplot (b) and 
(d)), reinforcing cyclostationary structure (subplots (a) & (c)), and truncating dependence (subplot (a)). 
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Fig. 1. The magnitude of tlie initial "smoothing" window used to smooth the empirical autocovariance function for the aggregation signal (a), for the 
simulated blood flow signal (b), for the bat signal (c), and for the oceanographic "meddy" signal (d). 



If we Fourier transform Mr{tn), then we obtain an estimator of the AF of {X„}. We define the EMpirical AF (EMAF) by 
a DFT of Mr{tn) 

N-l 



(16) 
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suffers from exactly the same problems as Mr{tn), e.g. a large variance. We note that 



n=0 11=0 

Thus our estimator of the ambiguity estimator is a multiplication of the EMAF with the FT of the kernel a;7-(t„), and so if 
I^^tI^^)! < 1, Ar{i') is a shrinkage estimator of Ar{i^). We must decide how to chose an appropriate kernel, or the degree 
of shrinkage at each lag and nonstationary frequency. This must be done considering the stochastic properties of the EMAF 
{At{u)}. Traditional estimators would smooth {M-rit)} in t uniformly in t to reduce mean square error - this corresponding 
to shrinking contributions non-adaptively at larger values of see e.g. [14, p. 4372]. We wish to retain more flexibility, and 
so do not wish to put in strong structural assumptions in our treatment of At{i'). Shrinkage must therefore be implemented 
adaptively. With the model of Definition [l] we shall determine the statistical properties of Ar{i'), to construct good shrinkage 
estimators of At-{v). 

Theorem 1: Concentration of Ambiguity Surface 
{X„} is assumed to be an ambiguity sparse process satisfying Eqn ([8]l, and T(t) = (A^ — |T|)Ai. 

T^^}) ^ l/(2Ai) as long as {v, t) is near one of the K contributions; Q\htxWi?,^Ji{5^^^>-{T{T){v-v[^^),T-T^I'^}) - 
l/(2At) — \v\. Furthermore if (;/, t) e VkiL) where we fix L > 0, and let 

W) - r) : ^mAt^{v - )2 + (r - r^'=))2 < ij , k = l,...,K, 

then 

2) If Eqn ^ holds with ^C^) G (|, l) the variance of At{v) takes the form: 

JCr{^-N,M)Vr{u), 



■{am} 



K-rii'] N, At) — {N ~ |-^|)4maxfc 1 _ j^^Q ^jj-j^ j-j^g normalized ambiguity variance Vt{v) defined implicitly. 

3) If {v, t) is not near any of the centre points then the relation of Ar{i^) is given by rel — TZr{v) — o{}Cr{v', N, At)). 

Proof: See Appendix A in supplementary material. ■ 
We additionally define 



2 



^rH - ^ - \r\)'+'^''^'-'^ Q - W\Atj , ViL) ^ UML). (18) 

We have now established that if (z^, r) is not sufficiently close to one of the points in the set { ('^i"^ ''^fc"^) } ^^^^ 
expectation gives a negligible contribution. On the other hand if (i/, r) is sufficiently close to one of the points in the set 
I , r^"-* ^ I then e|At-(i^)| is asymptotically divergent as Cr{v) becomes unbounded with increasing N. The variance 
of the EMAF in either of these two cases is non-negligible. Solely characterising the first and second moment of the AF 
is not sufficient to design an inference procedure unless we wish to use method of moments estimation, which is in general 
undesirable. We now aim to write down the (approximate) distribution of the EMAF to be able to do inferences more efficiently. 
We introduce the normalized AF given by: 

Ai^\^)^^j^ . (19) 

lC\'^{v-N,At) 

This normalized version of the AF is sensible considering vsj:{A'^\v)} — 0(1). If we are sufficiently far from D = Vi^L) 
(for a reasonably chosen L £ M+) then the expected modulus square of Ai^\iy) is 0(1) while evaluated at the present 
components in V will be increasing with Af. 

Referring to Eqn ([T6| we can rewrite At{v) as a sum of M^^tn). [27] discussed the distribution of At{v) for an underspread 
process. We have in this paper assumed that {X„}„ is a sample from a Gaussian process, and so ]\f^(t„) is a quadratic form 
and will be a weighted sum of two x^'s [28]. Ar{v) is therefore an aggregated sum of correlated weighted sum of two Xi's- 
To ensure that {Ar{v)} is a collection of Gaussian random variables the correlation of {X„} across n must be moderate. We 
also want to ensure that too much mass does not concentrate on one of the aggregated variables. Under such conditions Ar{v) 
is approximately Gaussian. 

^The factor of l/(2At) is expected as the bandwidth of the analytic signal. We only need to recover the AF as a function upto proportionality and can 
easily multiply our estimate by 2At. 
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Fig. 2. Estimated ambiguity function of tlie realizations; the first is the aggregation of a locally stationary and a cyclostationary signal, the second a simulated 
blood flow signal, the third the pipistrellus signal and the fourth is the oceanographic signal. 



Corollary 1: Magnitude of Ambiguity Surface 
The distribution of of the normalized AF represented in terms of its real and imaginary parts or 

takes the form 
1) If (z.,r)^I?thenQjM^^xi• 
2) If (i.,t) G V then = \Br{i^)\- 



(20) 



O (2;^) , and var|Al^V)} = ^rif), reljl^^V)} = 'R-ri'^)- We have 

\Br{i^)f ^ {T(r)(^ - ^(^)), r - 4"^' / {Cr{^)lCr{u)) = 0{l). 

Proof: This follows by direct calculation from Theorem [T] ■ 
We therefore expect to observe a mixture as the distribution of the EMAF, as is born out by practical examples, see Figure |3] 
We let pN.At be the probability that a random coefficient is non-zero, with the desired sampling. For a white noise process 
the expected AF decays like t^^ and ly^^ for sufficiently large i> and r. If we add up the coefficients for which Ifcrj < N^^^ 
then these are 2N + 0{\og{N)). The total number of coefficients are 4iV^ and so the probability of hitting such a coefficient 
is 0{1/N). For stationary and uniformly modulated processes the situation is very much similar. 

Proposition 1: Distribution of the Empirical Ambiguity Surface 
If we draw a coefficient at random at local frequency and time {v, r) 7^ (0, 0) then if the random process satisfies TZr{i^) = 
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we have that 



is distributed as a mixture of central and non-central x^'s or: 



(l-PJV.At)^^^X2+PW.At^Hx2KrM 



(21) 



Proof: This result follows by direct calculation from Theorem [T| and Lemma [T| ■ 
Even if we expect decay in Vt{v) for larger r and i/, this quantity will normally take a typical value of V and so we may 
state the following result. 
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Fig. 3. QQ-plots of the real and imaginary parts of the renormalized ambiguity function, using V to estimate the superimposed lines, (a) is showing the 
simulated mixture of a cyclostationary and locally stationary process, (b) the simulated blood flow signal, (c) the chirping bat signal, and (d) the oceanography 
signal. 



Corollary 2: Distribution of Empirical AF 
If we draw a coefficient at random at local frequency and time iv^r) ^ (0,0) then if the random process satisfies TZriv) = 

2 



we have that 



is distributed as a mixture or: 



(22) 



This gives a distribution which is a simple mixture distribution of a central and a non-central X2- is still heavily 
overparameterised even if the variances are similar across {ly, t) because in addition to the two parameters pN.At and V, the 
sequence {B^{i/)} is not known. If however pN,At is reasonably small, than the number of B^{v) that we have to learn is 
limited. We notice that this falls into the framework of the modelling adopted by [29] of "needles and haystacks". Some 
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coefficients {At^\v)} have a non-zero mean but we do not know which coefficients these are, or how frequent they are. Our 
abihty to learn the AF will depend on how pN,At changes with N and At. Realistically we assume that there is growing 
complexity in the time series and that this is quantified by pN.At- Unlike the case of nonparametric regression (e.g. [29]) we 
do not have a set of N uncorrected coefficients and N observations, but normally have to estimate 2N x 2N ~ 1 correlated 
coefficients from N observations. 

Proposition 2: Correlation Structure of Ariiy) 
If {Xn} is a white noise process then the covariance of the AF is 

" ^ ■^yi—j2 g — 27rAt max(i^i ,'^2 ) — ^2 ) ^277 At max(i/i ,^2 — ^^2 ) " 



A 



i27r(ri — T2) 
x{N- max(ri, r2))o-''e""(-'^"-'i'(5,i J, 
Proof: See Appendix C. ■ 
If we fix Ti = T2 = T then if we take Vk — n'^\t\ retrieve uncorrected coefficients. If t is 0(1), we could take I'k = jf- 
For white noise we approximately obtain a set of N uncorrected random variables by taking {^^(z/fc)}^. This argument can 
be repeated for any order one value of r we should choose. The full set of coefficients {Ar{i'k)}k.T is almost like many 
redundant collections of uncorrected random variables. It is reasonable with the model of an ambiguity sparse process that 
the full set of coefficients {Ar{i'k)}k.T are approximately like 0{N) collections of uncorrected coefficients; otherwise the 
procedure is like a composite likelihood method. Other choices (e.g. Vk = ^) will also produce such redundant collections. 

IV. Inference Methods 



The determination of the statistical properties of the EMAF in Section III has now put us in the fortunate position where 
we can propose inference methods. For ease we now model each individual Br{v). To remove explicit dependence on C we 
now define Br{v) = \/C^Ay)Br{v). We model the normalized AF to be 

Br{v) ^ N"" {Q,al) = Qr{v)e-'^^^''\ (23) 

and note that ci^ increases directly with Ct{v). N'-^ (•, •) refers to the complex-proper Gaussian distribution [20]. We may wish 
to put our belief regarding time-frequency structure into this prior, but notice that modelling local time and frequency (i/, r) 
directly controls the smoothness of the global time and frequency, rather than modelling global time and frequency directly, see 
for example work by [30], which constrains the time-varying spectrum to sparsity. Furthermore for some (very few) locations 
the above prior is not reasonable: e.g. Aq{Q) will always be real- valued, but the effects of modelling this single coefficient 
incorrectly are negligible as it will anyway always be retained. We define the likelihood in terms of the parameters 

V^=(V p alf , B^{Br{vk)}irM). Q = {Qr{yk)}, A-jAt^'K)}- (24) 

These vectors are defined over indices r e [~{N — 1), [N — 1)] and Vk = 2NAt ^ ~ ■ ■ ■ i^- We write Ai^\iyk) — 

QT{t^k)e~^'^^'''"\ Q = {Qr(i^fc)}nand define the "ambiguity likeHhood". 
Definition 2: Ambiguity Likelihood 

We define the ambiguity likelihood for the ambiguity surface Ar{v) to be 

H^, S. A) = n i ^ ' V 5 (Br(.^fc)) + , V e ^2 I (25) 

We have coupled the sparsity of the ambiguity surface between the real and imaginary components. Secondly p5| ) is like a 
true likelihood for a subset of N coefficients if we have chosen {i^k,T) E Ti where Ti is chosen to break up the correlation, 
averaged over the choices of coefficients we could have taken, i.e. the disjoint sets {T/} such that U;T; gives the full set of 
T and i/k that we have calculated, see Proposition [2] 

Definition 3: Ambiguity Marginal Likelihood 
We define the ambiguity marginal likelihood to be: 

^(^; A) = n {^Q.K)^-^ + yf^^^(^^)^"^} ■ (26) 

This form is derived in Appendix D by integrating over the other variables. We define — arg^ max A)} . This 

maximum can be found by numerical optimization methods. We can now find posterior estimators of Bt{v), following [29], 
[31], and using the posterior median estimator (this has the advantage of corresponding to hard thresholding for certain ranges 
of the parameters). We wish to calculate the posterior distribution of the ambiguity coefficient, given the observed ambiguity 
coefficient. We start from (p5]l for a single coefficient and get: 



f (BAi^k),Mvk)) = ^-^e V 5{^BAuu)) + —^e V e . (27) 

V / ttV TT^VcrJ. 

'To calculate A^'^^ we need to know K,t(v) and therefore S^^\ For most processes S^^^ = ^ is a reasonable choice. 
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Calculating Br{i^)\Ar{i') does not make sense, as we are then thinking of Br{v) with the phase information of At-{v)^ and 
we shall instead estimate Qt{v) conditional on observing Qt{v), subsequently shrinking Qt{v) irrespective of the phase 
distribution of At{v). The posterior distribution of the ambiguity coefficient at a given location is given by: 

f{Qr{v)\Qr[u)) = (l-p(P-*)(i.))<5(Q.M)+p(P-*)H/l (Qr[u)\Q,{l 



h {q\Q) 



VA \^ V ) V 2 y ' a2 + V 

Proof: See Appendix D, and the posterior probability is given by 

With this distribution a convenient estimator is the posterior median, see [29], and we take rir{vk) = (^^^^2Ap2_ J xhe 
posterior median estimator solves 

1 / |0|(nicdian) \ /S 

^(g(„.cdia„)|Q) ^ 2 « (1 - Q)) + P^r''\^k)'^ I ' ' I , (28) 



2^ 

and Crii^k) — p'T'°^^\i^k)VT{j^k) ■ If Cr(i^fc) < | then the posterior odds are in favour of a zero-valued coefficient and the 
coefficient's magnitude is estimated as zero, otherwise the median is found from ( |28] l. Thus with 

[ if CrK)<5 

the posterior median estimator is therefore 

A^^''\iyk)^eAiyk)ArM. (30) 



The estimator in ( |30| is a shrinkage estimator, as long as |0r(j^fc)| < 1 (which will be the case see e.g. [29]), which converts 
to a smoothing operation in the dual-time domain: 

Mi'''\tn) = At ^ dr{s,n - tn)Mr(s™). (31) 

Eqn. pTj i directly mirrors ( |T5] l, except {a;r(tn)} has been replaced by the data dependent shrinkage procedure {6'^(t„)}. 
Thus we can interpret (|30]l as a data-driven smoothing of the raw sample moments {MT-(t„)}, using the estimated sequence 
{Qritk)}- To investigate the smoothing function more clearly we note that 9r{tn) — P j_ QT-{v)e'^'^'^'^^ dv, and we see that for 
each fixed value of r we define a different smoothing kernel. As p decreases the probability of thresholding a larger portion 
increases, and so most of the ambiguity domain is zeroed out. If p is anticipated to be a decreasing function in N the procedure 
will be consistent. See Figure [T] for an example of four different kernels we retrieve for four different examples. The first 
subplot shows an example of an aggregation of a cyclostationary and a locally stationary plot. The estimators limit the support 
in T, and shows seasonality in t as the cyclostationary features are reinforced. Subplot (b) shows a chirping signal, where it 
is advantageous to use all lags at the same global time, and similar features are replicated in the meddy signal (subplot (d)). 
For the chirping acoustic signal there is clear selectivity in both global and local time. 

Furthermore note that p is the probability that we will come up with a non-zero contribution. It is not equal to the area of 
T) divided by Cr{v) because we are not sure that all of T) is actually supported. It may also become necessary to relax (|22]| 
to allow for different distributions of variances. We can instead (if necessary) take a mixture model with P components and 
taking {Vp\p^i, allowing for stronger and weaker signals. 

V. Estimating the Time-Varying Spectrum & Valid Second Order Forms 

For stationary time series the spectrum is an inherently important analysis tool, see e.g. [16]. It shows the distribution of 
energy of a time series across frequencies, thus characterising the time series, permits inference via the Whittle likelihood, 
allows us to check if a posited autocovariance is valid, and we may even forecast future values directly from the frequency 
domain, see e.g. [32]. Eqn (|5]l defined a time-varying spectrum by Fourier transforming the local moment function. Whilst this 
appears to be a self-evidently simple extension of the usual spectrum, it fails to satisfy a number of desiderata, see e.g. the 
full discussion in [17] or [9], such as positivity. It can even be proved that any bilinear representation of the spectral content 
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Fig. 4. Estimated covariance structure of the signals. Tliese are estimated from tlie shrunk ambiguity function, and corrected to non-negative eigenvalues. 
Subplot (a) is showing the simulated mixture of a cyclostationary and locally stationary process, (b) the simulated blood flow signal, (c) the chirping bat 
signal, and (d) the oceanography signal. 

of the signal must fail some of the desiderata that are required for a time-varying spectrum. We start by defining an estimator 
of the time-varying moments of Zt by 

AT-l 



k 



2NAt 



k=-N 



2NM 



(32) 



In section |n] we discussed various definitions of the time-varying spectrum corresponding to special cases o^ 



N-l N-1 



T=-(Af-l) fe=0 

By changing the definition of {lUt (i)} and a we will obtain different Fourier representations of the sequence MT-(t"). The 
utility of any particular bilinear representation will depend on the analysis problem in question. There is at times in signal 
processing confusion as to the relation of Eqn ( [T7] i (or Eqns ( fTS) & ( |3T| i, chosen to smooth Mr{tn)) with ( |33] l (or Eqn (|6|, 
defining a different time-frequency representation). The reason for this is simple; the equations appear to be performing the 
identical action. Because any bilinear representation {e.g. ( (33] l) S'"^''^^ /) can be written as a convolution of any other 



"^The indicator function can be removed and Mr (t " ) interpolated instead. 



UNIVERSITY COLLEGE RESEARCH REPORT 310 



12 



5"=^'"^ (t„, /) (see e.g. [8, p. 187]), at times any bilinear representation is viewed as an estimator of any other, see e.g. [33, 
p. 299]. However in signal processing the sequence {wt (ife)} is chosen to improve the visual appearance of S"''^{tn,f) 
rather than considering the estimation properties of a sample version - unlike tn^(tfe) and 9r{tk). L34J is a notable exception, 
but no practical estimation schemes are proposed in that paper, as knowledge of the higher order moments are required for 
implementing these ideas. We have separated the estimation of M^(f5^) from the (mathematical) choice of representation of 
this object, once estimated. 

What we have failed to discuss is the validity of any given estimated covariance sequence {M^(t")}. If we arrange M,-(t") 
to represent the covariance of the vector Z = (Z^^ . . . , it follows that A = E {ZZ^} = A ({^^(i^)}) , should be a 
valid covariance matrix, e.g. all its eigenvalues must be non-negative. For a stationary time series this can also be ensured by 
requiring the Fourier transform of the autocovariance sequence to be non-negative, and it is commonly considered a desideratum 
also for the time-varying spectrum. For a nonstationary series it is unrealistic to assume that a time-varying spectrum will be 
non-negative. 

The raw method of moments estimator we started with, e.g. A = ZZ^ = A ^{Mr(f")}^ , has only one eigenvalue 
corresponding to the total energy Z^Z, and all other eigenvalues are identically zero. We create an alternative estimator of 
A from M!f^'^^\tn), this yielding A^'^'^^^^ Unfortunately A^*^'^''^) does not necessarily have positive eigenvalues, nor is it in 
general sparse, where the latter could be used to ensure positivity. We may "correct" the estimator by two possible methods. 



raw moments raw moments 




200 400 600 SOO 100 200 300 400 500 

time time 

(c) (d) 

Fig. 5. Raw moments estimates of the covariance structure of the signals. Subplot (a) is showing the simulated mixture of a cyclostationary and locally 
stationary process, (b) the simulated blood flow signal, (c) the chirping bat signal, and (d) the oceanography signal. 

as follows. First we calculate the eigendecomposition A^*^'^'*'') = UTU^, and then we correct it using one of these two 
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methods: A^"'"''-''^ = U [ Y - {minj } I] or A^''^''''^ = U[Y]^U^, where {vj} are the eigenvalues of A^<'^<^y\ and 
[T]^ is thresholding the entries of the diagonal matrix at 0. Both estimators (e.g. A^'^''^'*^'' and a'^^^^'') are valid covariance 
matrices whilst A'^^'^y) is not. The estimate of the time-frequency spectrum produced by either of these matrices is however 
very similar in most cases. Using A^^'^'^^) without any adjustment is like estimating a variance to be negative, and is therefore 
not to be recommended. 



VI. Examples 

We consider both simulated and real data examples. The main purpose of this section is to show the performance of our 
proposed method, and using any particular choice of representation, as well as a given value of a. We stress again that in our 
opinion there is no optimal choice of representation or a, but rather each representation has clear advantages and disadvantages 
for different processes, once the local moments of the time series have been estimated. 

Let us start with an extremely sparse signal, defined by Xt = y}^^ + Y^^\ v}'^^ — X]f=o '^I'^^'et-i, t — 0, . . . , N — 1 
for fc = 1,2, with w^^^ = (l 0.33 0.266 0.2 0.133 0.066)'^, 

wf^ = (l 0.5 0.3 O.l)^ andcTj^^^ = l + -^), cr^^ = 4|sin(27r x 0.09t)| . The first of these two processes 

is a locally stationary process and the second a cyclostationary process. Their aggregation is neither locally stationary nor 
cyclostationary at the sampling we are looking at the signal. We simulate the signal to be length 512. This signal was considered 
using universal thresholding in the ambiguity domain by [27], which need not produce a valid estimator When analyzing it 
with the mixture model the estimated p is very small; p — 4.6 x 10^^, a number corresponding to about 50 pixels in the 
redundant representation. A plot of the estimated AF is given in Fig. |2|a), and the estimated moments are plotted in |4|^)- 
The AF is in this instance very sparse indeed with only some contributions near the origin and the cyclostationary frequencies 
0.18 = 2 X 0.09. The difference between the raw and estimated moments (e.g. [5]; a) vs|4|a)) is marked. We are here using one 
of the valid estimators of the covariance sequence, namely A2'''^'*'*'''. Looking at raw characteristics of the renormalized real 
and imaginary part of the AF clearly fails to indicate mixture components, see Figure [3|a). This is because of the very high 
degree of sparsity of the AF. The sample Wigner distribution is much too noisy to be useful, see Figure |6| a). Smoothing using 
the method outlined in this paper keeps the high-frequency cyclostationary component, see Figure |6|b). The spectrogram fails 
to represent the cyclostationary component, see Figure |6|c). No matter what sophisticated inference procedure we would use 
on the raw spectrogram this procedure would clearly fail to retrieve the cyclostationary component. Because this is a simulated 
signal we can look at the normalized error of the proposed estimator, versus the normalized error of the raw covariance 
estimator - see Figure |7|a)-(b). Comparing the estimated eigenvalues with the truth (Figure |7|c)) shows that the ambiguity 
domain shrinkage is regularizing the eigenvalues substantially, and the negative of the eigenvalues have here been set to zero to 
make the estimator valid. Figure |7jd) shows how extremely noisy the raw moments are, compared with the estimated moments 
(down at the very bottom of the plot). As a final point of interest we show {9r{t)} in Figure [ij a). This demonstrates how the 
estimated "smoothing filter" is reinforcing cyclical patterns with the right period, but shrinking most possible correlations. 

The second simulated example is a modified version of the simulated signal analysed in [35]. The signal has simply been 
reduced in frequency range and subsampled, but takes the form: 

M 

Zn=At ^ Vn-khkji + Vn, (34) 
l=-AI 

where {t'n} and {?7„} are independent white noise processes, M is a positive integer, with {hk.n}k a set of sequences defined 
for each value of n. This signal has a strong oscillatory pattern with chirping period, due to the blood flow being basically 
"forward" during systole and "reverse" during diastole. In this action a number of frequencies are temporarily visited. This 
oscillatory structure is reinforced by|2|b) showing a sloping linear structure. Because of the oscillatory structure of the signal 
it shows presence in most of the time-time domain - c/|4]^), but these are smoother versions of |5|b), again guaranteed to be 
positive semi-definite sequences by correcting eigenvalues. There is a clear two-population mixture in the renormalized data, 
c/ Figure [3jb), and the estimated probability of belonging to the signal component in the mixture is 0.086. Looking at the raw 
Wigner distribution (Fig.[6|d)), the spectrogram (Fig. [6|f)) and the smoothed representation of the estimated moments using the 
shrunken AF (Fig.|6|e), again using ( (33] l for nicer visual characteristics with ujT{t) corresponding to an appropriate combination 
of Hermite windows, see [36]). We see again how learning the smoothness of the data from the AF can substantially improve 
the estimation. The time domain "smoothing filter" is now less sparse but has an intrinsic width in t, cf Figure [TJb)- 

The next signal is a portion of a recording from a Pipistrellus batHThe full recording is quite long - over 59,417 time 
points, and we focus at a section towards the end of the recording. Fig7|2|c) shows the sparsity of the signal in the ambiguity 
domain, which would not be well approximated if limited to a small box near the origin. The two estimates of the covariance 
are given in Fig |^and |5] where highly localized events in time are detected. The quantile/quantile- (qq-) plot Qc)) shows two 
populations, and 9 = 0.21, which is quite high. The raw Wigner plot is very noisy, and we prefer to here show the Rihazceck 
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Fig. 6. Time-frequency representations of some of the sample signals. Subplot (a) is the raw Wigner distribution on a dB scale of the simulated mixture 
of a cyclostationary and locally stationary process, (b) is the smoothed Wigner distribution of the moments estimated from Ambiguity domain and (c) is the 
raw spectrogram from the data. Subplot (d) is the raw Wigner distribution on a dB scale of the simulated blood flow signal, (e) is the spectrogram of the 
moments estimated from Ambiguity domain and (f) is the raw spectrogram from the data. 
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Fig. 7. Comparison of estimation en'or for a single realization of the aggregation of the cyclostationary and locally stationary process. The first shows the 
normalized estimation error using the EBAYES method combined with correcting the eigenvalues, the second the normalized estimation error using the raw 
moments. Subplot (c) shows the eigenvalues of these two methods. The solid thick line of one nonzero eigenvalue is the sample moments, the solid thin line 
the theoretical eigenvalues and the dot dashed is using the EBAYES method combined with connecting the eigenvalues. Subplot (d) shows the normalized 
estimation error for a single row of the matrix, where the EBAYES method is seen as superimposed over the x-axis, and the dotted line is the raw moments. 



distribution of the signal (see Figure and (b)), corresponding to a raw Fourier transform of the time-shifted moments with 
no additional windowing for better representation. The spectrogram loses precision, see Figure [Hj^c). The actual smoothing 
filter is quite sparse, but informative, see Figure [Tfc), highlighting the cyclostationary nature of the signal. 

The final signal is from an application in oceanography, see [37]. We propose to analyze experimental data corresponding 
to the velocity from a Lagrangian drifter. There are more components present in the data than merely the signal from a vortex 
(see e.g. [38]). We plot the estimated AF and covariance in Figures ^d) and |4|d). Again this is a highly oscillatory signal 
correlated over long time intervals. The qq-plot of the AF (Fig [3]^c)) now almost breaks the 2-component model in favour of 
a 3-component model, and 6 is relatively high, namely 0.31. The raw Wigner distribution |8jd) is very noisy, and it is hard 
to make out the presence of distinct components. We look at both the spectrogram and the smoothed version of the Wigner 
distribution (again smoothing with Hermite windows) (Figs[8jf) and[8]^e)) and see that whilst the Wigner distribution has some 
issues with interference in the representation (at about / « 0.11), the localization of the instantaneous frequency path is much 
improved compared to the spectrogram. These paths are used to define summaries of the data (see e.g. [38]). The temporal 
smoothing filter uses much of the local structure to improve estimation. 



VII. Discussion 

The aim of this paper was to introduce a new class of nonstationary signals, and propose appropriate inference procedures 
for their second order properties. This was done by introducing ambiguity sparse processes, assumed to consist of a collection 
of high intensity regions in the ambiguity domain. Inference was implemented directly in the ambiguity domain, and estimators 
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Fig. 8. Time-frequency representations of some of tlie sample signals. Subplot (a) is the raw Wigner distribution on a dB scale of the pipistrellus signal, (b) 
is the spectrogram of the moments estimated from Ambiguity domain and (c) is the raw spectrogram from the data. Subplot (d) is the raw Wigner distribution 
on a dB scale of the oceanographic signal, (e) is the spectrogram of the moments estimated from Ambiguity domain and (f) is the raw spectrogram from the 
data. 
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were transferred back into the time domain, so that they could be corrected into valid estimators of second order structure, 
an action that is not equivalent to requiring the time-frequency distribution to be real and positive. In general for an arbitrary 
harmonizable process the sinusoids will not diagonaUze the covariance matrix of the observed sample, and so positivity of 
the "spectrum" cannot be a requirement to specify a valid process. The estimators we propose of second order structure can 
however still be represented in terms of a time- varying spectrum, using any choice of such an object that is suitable for the 
process in question, once a valid covariance has been estimated. In practice we would first estimate the time- varying moments, 
correct them into valid estimators, and then subsequently choose from possible time-varying spectral representations. In our 
experience the correction has little effect on the pictorial impression of most time-varying spectra, as this de-facto seems to 
add a positive constant to the time-varying spectrum. Most sunnmaries in the time-frequency plane compare relative magnitude, 
making this uniform level change of little interest. 

In statistics interest has focused on various procedures using the spectrogram, or variants thereof {e.g. [5] or [3]), and this 
has a number of desirable properties such as positivity of the spectrum, but also a number of shortcomings, see [9], mainly in 
terms of what processes can be treated well by such a framework. We have not solved the (intractable) problem of producing 
a time-varying spectral representation for an arbitrary nonstationary process, but nor is it reasonable to expect to do so. 

A common complaint in statistics is that computing time-varying spectra easily degenerates into making "pretty pictures", 
and is not really an important inference problem in its own right. However in many practical problems summaries of time series 
are defined directly from their time- varying spectrum, we mention in particular oceanography [39], the analysis of biomedical 
signals, e.g. [40]-[42], and various branches of physics. To be able to obtain good summaries, the estimate of the time- 
frequency representation must itself be sufficiently good and the representation must be chosen appropriately, not smoothing 
out important features of interest. A great weakness of existing methods is the necessity of assuming many smoothness 
properties of the covariance of the signal before estimation. The difference between white noise and an interesting signal 
is exactly structure and concentration: and allowing a more wide range of possible structure can help us detect otherwise 
less easily characterized behavior. For stationary time series estimation was enabled because we assumed that the sinusoids 
corresponded to the eigenvectors of the covariance matrix, and so the positivity of the spectrum was required for a process to 
be valid. We reiterate that for an arbitrary harmonizable process there is no reason why this should "nearly" be the case. 

Comparing with other methods that use sparsity to estimate time-varying spectra, such as [30], [43], is that we assume 
sparsity in the ambiguity domain rather than in the time-frequency domain. Furthermore we have provided a stochastic model 
for the types of signals where this inference method will be appropriate. By thresholding or correcting the eigenvalues of the 
eigendecomposition of the estimated covariance matrix, our estimated matrix is guaranteed to be a vaUd covariance matrix, 
another problem with many existing methods. Excessive model flexibility is a curse; we cannot anticipate that it is possible 
to estimate any arbitrary nonstationary process. We relax the straight jacket of excessive smoothness so that we can at least 
estimate a larger class of nonstationary processes to encompass more realistic data sequences. In many applications as the 
sampling rate is matched to the actual bandwidth of the observed phenomenon, excessive smoothing will never recover the 
phenomena of interest, and more of the structure of the signal needs to be modeUed and utilized, as otherwise several important 
features of the data will be smoothed out. 

Acknowledgement 

The author gratefully acknowledges the financial support from the EPSRC via EP/I005250/1 and many enlightening discus- 
sions with Dr Heidi Hindberg, Norut, about nonstationary processes. 

References 

[1] M. B. Priestley, "Evolutionary spectra and non-stationary processes," Journal of the Royal Statistical Society, B, vol. 27, pp. 204-237, 1965. 

[2] R. Silverman, "Locally stationary random processes," IRE Trans. Info. Theo., vol. 3, pp. 182-187, 1957. 

[3] R. Dahlhaus, "Fitting time series models to nonstationary processes," The Annals of Statistics, vol. 25, pp. 1-37, 1997. 

[4] , "A likelihood approximation for locally stationary processes," The Annals of Statistics, vol. 28, pp. 1762-1794, 2000. 

[5] H. Ombao, J. Raz, R. von Sachs, and B. Mallow, "Automatic statistical analysis of bivariate nonstationary time series," /. Am. Stat. Assoc., vol. 96, pp. 
543-560, 2001. 

[6] S. G. Mallat, G. Papanicolau, and Z. Zhang, "Adaptive covariance estimation for locally stationary processes," Annals of Statistics, vol. 26, pp. 1^7, 
1998. 

[7] G. P. Nason, Wavelet Methods in Statistics with R. Berlin: Springer, 2008. 

[8] P. Handrin, Time-Frequency/Time-Scale Analysis. New York: Academic Press, 1999. 

[9] L. Cohen, Time-frequency analysis: Theory and applications. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1995. 
[10] M. Lofeve, Probability theory. New York, USA: Van Nostrand, 1963. 

[11] K. Karhunen, "Uber lineare methoden in der wahrscheinlichkeitsrechnung," Ann. Acad. Sci. Fenn Ser A, I. Math., vol. 37, pp. 3-79, 1947. 
[12] G. Matz and et al., "Generalized evolutionary spectral analysis and the Weyl spectrum of nonstationary random processes," IEEE Trans. Signal Proc, 
vol. 45, pp. 1520-1534, 1997. 

[13] G. Matz and F. Hlawatsch, "Nonstationary spectral analysis based on time-frequency operator symbols," IEEE Trans, on Information Theory, vol. 52, 
pp. 1067-1086, 2006. 

[14] M. Jachan and et al., "Time-frequency arma models and parameter estimators for underspread nonstationary random processes," IEEE Trans. Signal 

Proc, vol. 55, pp. 4366-4381, 2007. 
[15] R. J. Adler and J. E. Taylor, Random Fields and Geometry. Berlin: Springer, 2007. 

[16] D. B. Percival and A. T. Walden, Spectral Analysis for Physical Applications. Cambridge, UK: Cambridge University Press, 1993. 



UNIVERSITY COLLEGE RESEARCH REPORT 310 



18 



[17] R. M. Loynes, "On the concept of the spectram for non-stationary processes," / Roy. Stat. Soc. B, vol. 30, pp. 1-30, 1968. 

[18] W. Martin and P. Flandrin, "Wigner-Ville spectral-analysis of non-stationary processes," IEEE Trans. Acous., Speech & Signal Proc., vol. 33, pp. 
1461-1470, 1985. 

[19] J. Jeong and W. J. Williams, "AUas-free generalized discrete-time time-frequency distributions," IEEE Trans. Signal Proc, vol. 40, pp. 2757-2765, 1992. 
[20] P. J. Schreier and L. L. Scharf, Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals. Cambridge, 

UK: Cambridge University Press, 2010. 
[21] A. Rihaczek, "Signal energy distribution in time and frequency," IEEE Trans. Inf. Theory, vol. 14, pp. 369-374, 1968. 
[22] R. E. Blahut, W. Miller, and C. H. W. (ed). Radar and Sonar, Part I. New York, USA: Springer Verlag, 1991. 
[23] W. Martin, "Line tracking in nonstationary processes," Signal Processing, vol. 3, pp. 147-155, 1981. 

[24] H. Tong, "Some comments on spectral representations of non-stationary stochastic processes," / Appl. Proh., vol. 10, pp. 881-885, 1973. 

[25] , "On time-dependent linear transformations of non- stationary stochastic processes," / Appl. Prob., vol. 11, pp. 53-62, 1974. 

[26] L. Isserlis, "On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables," Biometrika, 
vol. 12, pp. 134-139, 1918. 

[27] H. Hindberg and S. C. Olhede, "Estimation of ambiguity functions with limited spread," IEEE Trans. Signal Proc, vol. 58, pp. 2383-2388, 2010. 
[28] N. Johnson and S. Kotz, Continuous Univariate Distributions, Vol. 2. New York, USA: Wiley, 1970. 

[29] I. M. Johnstone and B. W. Silverman, "Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences," The Annals of Statistics, 
vol. 32, pp. 1594-1649, 2004. 

[30] P. J. Wolfe, S. J. Godsill, and W. J. Ng, "Bayesian variable selection and regularization for time-frequency surface estimation," / Roy. Stat. Soc, b, 
vol. 66, pp. 575-589, 2004. 

[31] X. Wang and A. T. A. Wood, "Empirical Bayes block shrinkage of wavelet coefficients via the noncentral chi2 distribution," Biometrika, vol. 93, pp. 
705-722, 2006. 

[32] J. Haywood and G. T. Wilson, "Fitting time series models by minimizing multistep-ahead errors: a frequency domain approach," / Roy. Stat. Soc, b, 
vol. 59, pp. 237-54, 1997. 

[33] L. L. Scharf, P. J. Schreier, and A. Hanssen, "The Hilbert space geometry of the Rihaczek distribution for stochastic analytic signals," IEEE Signal Proc. 
Lett., vol. 12, pp. 297-300, 2005. 

[34] A. M. Sayeed and D. L. Jones, "Optimal kernels for nonstationary spectral estimation," IEEE Trans. Signal Proc, vol. 43, pp. 478^91, 1995. 

[35] S. C. Olhede and A. T. Walden, "Noise reduction in directional signals illustrated on quadrature doppler ultrasound," IEEE Trans, on Biomedical 

Engineering, vol. 50, pp. 51-57, 2003. 
[36] I. Daubechies, "Time-frequency localization operators - a geometric phase space approach," IEEE Trans. Info. Theory, vol. 34, pp. 605-612, 1988. 
[37] P. L. Richardson, A. S. Bower, and W. Zenk, "A census of Meddies tracked by floats," Progress in Oceanography, vol. 45, pp. 209-250, 2000. 
[38] J. M. Lilly and S. C. Olhede, "Bivariate instantaneous frequency and bandwidth," IEEE Trans, on Signal Processing, vol. 58, pp. 591-603, 2010. 

[39] , "On the analytic wavelet transform," IEEE Trans, on Information Theory, vol. 57, pp. 4135^156, 2010. 

[40] M. Unser and A. Aldroubi, "A review of wavelets in biomedical applications," Proc. IEEE, vol. 84, pp. 626-638, 1996. 

[41] D. V. de Ville, T. Blu, and M. Unser, "Surfing the brain - an overview of wavelet-based techniques for fMRl data analysis," IEEE Engineering in 

medicine and biology magazine, vol. 25, pp. 65-78, 2006. 
[42] S. D. Cranstoun, H. C. Ombao, R. von Sachs, W. S. Quo, and B. Litt, "Time-frequency spectral estimation of multichannel EEG using the auto-SLEX 

method," IEEE Trans, on Biomedical Engineering, vol. 49, pp. 988-996, 2002. 
[43] P. Flandrin and P. Borgnat, "Time-frequency energy distributions meet compressed sensing," IEEE Trans. Signal Proc, vol. 58, pp. 2974-2982, 2010. 
[44] 1. S. Gradshteyn, 1. M. Ryzhik, A. Jeffrey, and D. Zwillinger, 77ie table of integrals, series and products, 6th edition. Academic Press, 2000. 



A. Proof of Theorem 1 

A. Expectation of the EMAF 

We start by noting that (see [27, p. 2384, eqn. 4]), but adjusted to a sampling period not necessarily set to one: 

. j_ „ ^ f 

2A* / 2At J 



^A{v, t]}^ J' J ' e*"(^-i+")^*('^'-"'DAr_|,| {At{iy' - ly)) e^"i'^'S[y\ /) dv' df, (A-35) 

with the usual definition of the scaled Dirichlet kernel of [16, p. 102]: 

„ , , sinfvriVi^) 

DnH^ .\ / ■ (A-36) 
sin(7ri/j 

By definition the dual frequency spectrum can be written in terms of the AF as: 

oo 

T — — 00 

We then have that: 

At ^ A,,(z/')^iv-|r| (A<(z/'-t/))e"(^-i+")^*('^'-"'e,._,(i^') t^^', (A-37) 
where the sequence e,- is now defined by: 

J max{ — 1/,0) 

with (as usual) sine (/) — sin(/)//. Note that eT-(i^) therefore has a n implicit and homogeneous dependence on At. We 
can write the expectation of the EMAF in terms of the AF, see ( |A-37[ ). We see that the theoretical support of the ambiguity 
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function, given by the points where |v4,-/(i/')| is non-neghgible is "smeared out" when sampled at a fixed A< and N by the two 
kernel functions Djy^^ {v' — v) and e^i-Jj/). The effects of this convolution must be further investigated when modelling the 
structure of the AF. The value of Eqn ( |A-37| i sampled at unit sampling intervals for popular models like stationary processes, 

s, is normally OyN 



or uniformly modulated processes. 



Ily 0[N - I r|), see [27], but depends on the model for Ar{v). 



^1 oo 



(i) 



,(iV-l + T)At(^'-^)^ 



j,{N -1 + T)&t{u' -v) 



/_°Io '*"'/At<''')°iV-|T| (Ate-'' - '^)) ^TW^'V- d,/ du' + O (AtlA^Ci-)!) . (A-39) 



If we are far from the K singularities then we find we can Taylor expand the smooth function A.^i /AtW) in a Taylor series 



A, 



(A-40) 



We take v' = v A- ^i/T (to catch the contributions of the Dirichlet kernel) and u' = u + ^2^t, where T{t) (N ~ \T\)At 
(suppressing the r dependence of this variable as appropriate). 
For large N and small At we therefore find 



sin(7r^i) 

tTr{N-l + 



N 



1 



sine (^n{u ~u){^ 



\2At 
1 

2Ai 



-W + ^i/T\) 



i7r(Af-l+r)5ig-i7rC2(^-i'At-4 



^/^) Siii (7rg2(i-kAt + ei/^l)) 

77^2 At 



Therefore the integrals become 

eIa.h} = r r 

^ J —oo J — oo 

/ siii(7r^i) 



Au/Atil^) + ^Au/AtH ■ [ 1^ 



o 



oo /'OO 



oo J — oo 

oo /'OO 



1 

N 
sin(7rgi) 



,j:;r(JV-l+T) 



jv e 



g«7rCig-i7r^2(; 



(i-.At) / sin(7r6(^-t.|At|)) 



-.At) Sin«2(|-^|At|)) 
7rC2At 



da 

T 



da At 



— ^^(i/) / / /(wi < 1)/(W2 e [max(-zyAt,0),min(- - i^At, -)])d2a; 

^' J-oo -oo ^ ^ 

1 



At 

ArM X 1 X ( ^ ^ 



(A-41) 



Thus the expectation of this estimator far away from the singularities depends both on the sampling rate At and how close 
we are to the Nyquist frequency. For the points close to the singularities we instead use the model and the concentration of 
the AF in order to derive its expectation. We use 



S('=)(i/,t/7V) 



At^ii^- i^^J"^ 



Ar2 



(A-42) 



where B'^^\v,u) is a twice differentiable function. We write [v^ , Tq ) = (vq , Uq N). We can now rewrite the integral 
using a change of variables of 



(fe) , Ai 



T(ry 



Tiry 



T -T, 



(fc) 



At(u-uf), i 



^0 ' 
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therefore ending up with (for some suitable choice of 1 > 7 > 0) 



f- 



\2At 



— DN_\r\ [ At 



^„(jV„l+^)At[%^]^_„(4,_A2)At(3i^-(,.+ «i^)); 



N-\t\) ^ \N ) 

. Sin (vrfe - M)Ati^^ - l'^ + ^ 



T(r) 



^(6-A2)(i-|j^At|) 



(A-43) 



A term is added in ( |A-39[ )(1) because there is an error due to the Riemann approximation to the sum. Subsequently (step 
( |A-43[ )(2)) there is a change of variables. Let Ariv) = X]fe ^t*^' l'^)' separating the components from each "island" of 
contribution. Thus the component renormalized ambiguity is, ignoring terms because we take only the first parts of a Taylor 
series of B'^^\v.t/N), as well as the Dirichlet kernel: 



1 



N'' AN-\t\)'' 
-Ni J-{N-\t\)-< 



sin(7r(^i - ,^[^,^x,) 



Ail - Ai) 



■ ic \ Ml At \ sin (TrfA — Aolfir — I Aiz^D) 



T^{i2-\2){\~\^tv\ 

Note that changing the limits in the integral from —N'' to —00 makes no appreciable difference as the contributions that have 
been added behave like: 



1 



1 



-25- 



J J rcos(0)-Airsin((?)-Ai 7^,-, 2^^ Jiv. 

thus we need 7 > 0. We also need to convince ourselves that the integral is really 0(1) as claimed, and let 



N 



-1 



N 



(A-44) 



.Jl(5;A) = 



2At J-oo J-o 



t(5i - Ai) 



t(52 - A2)(A - |Ati.|)(l - lAt^l) 



It follows J'l is convergent if < S^'^^ < |, which can easily be shown directly by splitting the range of integration, and 
bounding each component (using the long-range decay), by implementing the integration in one variable after the other and 
then using the long range decay. The special case of (5^'^' = 5 is also fine, and this can be seen from using the fact that the 
2-D Fourier transforrr0 The FT with canonical variable u) of 



sin(7r^i) 



gi7rClg-«7r?2(; 



is a band-pass filte:]^ /(cji e [0,1]) and /(cjs e [max{~Atiy,0) , min(i — Atiy, ^)])- We therefore see that the renormalized 
variables ^1 has a canonical variable restricted so that the FT of 1/ is restricted to [0, T] which is exactly the interval we 
have observed the data over (and as the data is nonstationary we cannot go beyond that interval). The second variable ^2 is 
restricted in frequency to a range that is limited because of the act of making the signal analytic. Using [44, 6.561.14] with 
u = \J iJ\ as the radius of the Fourier variable and requirin 



00 fOO 



— oc — 00 







2tv X 



2I-2 

(27rtc; 
2 



(A-45) 



< ^c-) < 1 



We need to define the Fourier transform carefully as has been a frequency variable and §2 a time variable. The FT therefore has the opposite sign 
in the complex exponential for the two variables. 

'You would expect uii to range in [—1/2, 1/2] but because we observe t in [0, T] this would not be the case. 

**We numeiically approximate finite length sums using infinite domain integrals, however this makes the FT periodic in lji. 
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Thus using Plancherel's theorem and the change of variable 2Tru) = u;^ 



M6;X) 



27r 
At 

47r 
At 

1 

At 



/■27r /■27rmin(i i-|i/|At) 



J27rmax(0,-Atiy) 
oo poo 



^0 

oo 



r((5W) 



2i-2.(-'^-i+2.'-) r (1 ^ 1 ^ ^ 1 



r ((jc^)) 



if (5^*^) G [1/4, 3/4]. This should be compared to Eqn ( |A-41| l, and we see that if the AF is sufficiently concentrated we no longer 
have a loss of information proportional to 2Si ^ 1^1- '^^^ interested in the second order structure up to proportionality, 

the multiplication by does not concern us, and could easily be corrected for The truncation in the canonical variable of 
^1 exactly implies that the time variation is (for not normalized variables) truncated onto [0,T]. The second truncation is in 
the frequency variable / (once renormalized), which just ensures \f v > Q that < 012 < (1/2 — I^^^^D- 

In theory we also wish to include stationary processes, as well as uniformly modulated processes in the class of investigated 
processes. A sample from an analytic stationary process with autocovariance sequence Mr has an Empirical Ambiguity function 
with expectation [27]: 

y^r[y) = i?^_|,|(Atzy)M,e-*"''^*(^+"-i), (A-46) 

(compared with the infinite sample Ar{v) ~ S{iy)Mr) whilst a sample from the analytic signal of a uniformly modulated 
process with the Fourier transform of the analytic extension of the modulation function corresponding to S(i^) has expectation: 



MrM = f ^ - W\ ] S(j^)sillC 



\2At 



(A-47) 



compare with the infinite length sample Ariiy) — St-^{i'). Both these means decay away from the point (0, 0), if with different 
decay in ly and t. These can be approximated by the ellipsoidal decay model (see Eqn (8) in the paper). 



B. Variance of the EMAF 

An expression for the variance of a general harmonizable process is given in [27, Eqn A-4], adjusted to the case of At 7^ 1 
namely: 



var 



2At / 2At / 2At / 2At 



^{dZ{h)dZ*{f2)dZ*{a^)dZ{a2)} 

Ja Jo Jo 

i27i-(/i-ai)rAt i7r(/i-/2-Qi+Q2)(W-l-r)At , 



e ^' e 

DN_\r\iAt{ai - 02 - I'))- 
We shall simplify this expression somewhat. We have that with 

fi='^' + /, /2 = /, ai = I'" + /', a2 = /'. 



i?^_|,|(A<(/i-/2-i^)) 



var{l^(:/)} + E{l^(zy)} 



2At / 2At 



-/ 



2At / 2A 



iAt f 



J-f Jo J-f 

2»^/rAtg |^^(^, + f)dZ* {f)dZ* [v" + f)dZ(f)} 

At 



e 

We assume that {dZ{f)} is a Gaussian process and then use Isserlis' theorem to obtain that 



var 



Ar{v)]^ / / / e"(^-i+^)^*(''-^)i?^_|,|(At(z.'-i.))e2*'^/^^* 

(E {dZ{y' + f)dZ* {v" + /')} E {dZ* {f)dZ{f)} 
+E {dZ{,.' + f)dZ{f')} E {dZ* {f)dZ* {u" + /')}) 



(A-48) 



(A-49) 



'We now get an extra factor of 2 from the periodicity in uji . 
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Given we have assumed propriety of Z{t) we obtain that E{dZ{u' + f)dZ{f)}E{dZ*{f)dZ*{u" + /')} = 0, and so the 
remaining term gives us: 

1 1 j- 1 1 .p/ 
E {dZ{,,' + f)dZ* (i/" + /')} E {dZ* if)dZ{f)} 

1 1 J 1 1 f' 

f 2At r 2At /* 2Ai r 2At ^ » 

s{v' + f-v"-fy + f)s*{!-rj') 

We can re-write this in terms of the ambiguity function and redefine v' as well as v" to have 



var 



oo 

At J2 A,,(zy' + /-z/"-/')e-2^-^'^*(""+^'+") 



r' — — oo 



xAt ^ A%{f-f')e^"^"^'f' 

r"=—oo 

^-i^iN-l+r)Atu" ^/^t^") e-^i-f rAt ^^// 

We need to put in our knowledge for when At'{u' + f — u" — f')A*„{f — /') is large to simphfy this expression. We start 
by a change of variables 

uj = iy"-v', v' = s, f-f'^u, f^b. 

With this change of variables we have that (with the hmits {Ip} depending on the outer integral dummy variables, as well as 
u, and are in general complicated objects) with T = T{t) = At{N — |t|): 



fh fh fh fie ^ 

= / / / / At ^ A^ ^ 



var 



T — — oo r"=— OO 



Ar>{u - ^)p-2--'At(s+u,-«+6+-)^*„(y)e2ixT"At(-«+6) 
g_i^(jV-l+r)At(s+c.)£,^_l^l (^^(g ^ ^■j) g-2ix(-«+b)rAt ^^^^ 

We implement yet several other changes of variables 

(i) , (k) , 

to focus on the locations close to the finite number of singularities 

,.1-, ,. OO oo 

var{A.(.)} = EE/ / / / E A* E 

k I Js=l3 Jb=h Jik Jm t'=-oo t"=-oo 

+ ^^/3.)g-2i,rr'At(a-.«-C,/T+6+.)^*,,(^(fe) ^ ^^/y)g2i.r"At(-(.('=)+^,/T)+6) 
g-i.(iV-l+.)At(.+(.^'=>+../T)-.(0_^,/^)^^_^^^ 1^^^^^ ^ ^^(,) ^ ^^^^^ _ ^(0 _ ^^1^^^^ 
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As of yet we have not used the model, or made any approximations. We see from this expression that we only need to worry 
about k = I (as otherwise contributions become negligible) and so we obtain for some 7 > 

var{^,(i/)} = J2 ^* E E D^_|,|(Ais)e-2--'^*(«-'^^''-^+''+'')e2--''^*(-('^o'='+')./m6) 

k=oJs=hJb=h t' = -oo t"=-oo 

^_i„(jV-l+.)Atfe/T-|./T)g2i.(.^'=) + ^).At / / ^^,(^(0 ^ + r]k/T) 

+ m/T - e,/r))) + 0{1). 

We implement a final change of variable of 

and find 



a = TiT)s, 



var{A.(z.)} = ^/ / EE 

^_i^(iV-l+r)At(,„/T-ij,/T)g2i7r(i/<'=' + ^)TAt ^5 

(A/' - |T|)At 

/ / + a/T(r))yi;„(^('=) + %/T(r)) 

^^-kl N-\r\ ) + ^^"^^^ 

Then for a suitable 7 > after implementing the integral in the variable b and restricting the range of integration so that the 
Umit of the integrals increasing in N is non-negligible: 



{Mu)} = E E (2^-H)e-^-(^'-^"'M^-H)sinc(^(r'-r")(^-|z. 



I At 



/'/V_l |\2 sill(7rQ) -2i7rT'At(-i^*'''+i/) -2i7rT" Ati^**"' -iir (JV- 1 + t) At (77^ /T-^^ /T) 2i7ri/<'°' T At du 

^ '^'^ ^r^^ (N-\t\)U 

pT^ lo nT^ Ip, c 



sin(7r(Q; + 77*, - ik)) d^k drjk 



7r(a + ?7fc -6) A^- kl N - \t\ 



+ 0{1) 



E r E E - e--(-'-^")-*(^-l'^l)sinc (.(r' - r") (i - |.| At 

k °^ r' — — 00 t" — — oo \ \ 

sin(7rQ) ^_2i^r'Af^^-i,.(.,fc-gfc)^2»,r^^'''(T-r"+T')At Z''^'^'* ^'^ A^, (l^'^'''' + ^k /T(t)) 

■wa {N ~\T\)At Jj^yi^ Jj.~,i^ ° 



= E ls^'='(-^'\r('=>)p (AT - \r\r'""-'j^'\5^'\u,r) + 0{1), (A-52) 

this defining J^^\6^''\u,t) where we have implemented an integral in a using 
sin(7rQ) sin(7r(Q + -qk - jk)) _ /""'"(s' w^ + s) 



/•°° sin(7ra) sin(^(Q + yk - gfc)) ^2i.a^^^ ^ n'^K^-^^^) ^^ 
y_oo 7r(a + ??fe-Cfc) ymax(-i,j^) 

sin (7r((>7fe -6)(l+ ^))) 
7r(% - ^fc 



= e"*'' ^"'^ "^^ ^ ^ \ ^ ' ' . (A-53) 
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We then get with fih = r' — Tq'^ as well as f2k = t" — Tq''^ 

V ; J -co J -co + Ti\] [vl + T2\] 

Sine (w{hk - f2k) Q - k|At^^ e-2i^(fi,+rr)At.g2i^.('='(r+fi.-f.,)Atg2i^4i(-r'|,-r''r,,+r^,) 

(tt ((r?fc - a) (l + AR 



We define the projection operator (assuming 5^''^ G [j, l]): 



d^fc drjkdfikdf2k + o(l). (A-54) 



-«^(fn,-f2^,)At(5ij-|^l-2,.(°>) 



sine ( 7r(fifc - f2fc) ( ^ - \v\M ) ) d?7fc df2fc, (A-55) 



TV9,{o.) = € [0, l\)2''''\2nu:r''+'''"' E^_^/(,^, _ ^^At e [max(-i.At, 0), min(l - z.At], ^)). 

Finally we note 

1 ffcl ffcl /"OO /'OC -1 

jf)(5('=\i.,r) = J^e-^*-^*(^o -"0 / / ^— ^e-^'-^"''^*7'ffi(6,ri)d^fcdfi+o(l). (A-56) 

We have ijj{v) = [oJi + i^At,w2] (again assuming 6'^''^ G [j, l]) 

^(^^^e— = 2--(2..(.At))-+--^t^. (A-57) 
I & + '''ifeJ J ^ ' 

Using Plancherel theorem we have 

7(a;i € [0, l])2i-^^(27ra;)-^+^*"' ^^^^^7(a;2 - Ati.*") € [max(-z.At, 0), min(l - |i.|At], Ijjd^o; 

+0(1). 

Again the limits of these truncations are set Uke in the expectation limits. Truncating the integrals using the indicator functions 
we find that: 

M [ V{5(^)) J Jo ■/Atmax(i.^°'-!.,0) (27r)^ 

From this it is possible to deduce that ^2 '^^ {^^^^ -i is finite for the previously mentioned range of j < S^'^^ < | . Furthermore 
the latter integral has the second range depending on ^ — \iy'\At, and so it follows: 

y^v{Mu)} = o|^i-(^^-|H)5^(Ar-|r|)*^"''-ij. (A-59) 

The finiteness of the limiting integrals therefore follow from exactly the same calculations as in the case of the mean. This 
should be compared in units with our expression for the expectation square. We see that the units agree, but that the expectation 
will be large compared to the variance, and should therefore be recoverable. We have if {u, r) e Vk with 6 = max 6k 

^ =^ I / 

vax{Ar{u)} 

which naturally is independent of the units of the problem, and becomes larger with increasing N and decreasing At. If 
^(fc) = S for all k then this normaUzed expectation does not depend on individual S^''^ but only needs 6 > 0. 
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C. Relation of the EMAF 

Finally to determine all properties of the ambiguity function we also need to determine its relation sequence. We find by 
direct calculation that (extending in [27, Eqn A-7] to At ^ 1): 

1 1 f 1 1 f' 

~| 1 /■2At /■ 2At J /■ 2At /■ 2At J s. / / s 

i/O 'J — f •J — f 

g2i,r/rAtg |^^(^, + f)dZ* {f)dZ {v" + f')dZ*{f')} 

We have assume that dZ{f) is a Gaussian process and then use Isserliss theorem to obtain that: 

(E + + /')} E {dZ* {f)dZ*if')} 

+E {dZiiy' + f)dZ* (/')} E {dZ* {f)dZ{v" + /')}) 
^i^(7V-i+r)An."-.)^^_l^l (Ai(z." - I/)) e^*-/'-^* dz/' df dv' df. 

Again, with the assumption that dZ{f) is proper we are left with 

> /■ 2At /■ 2At J r 2At r 2&t J , , . , / , , « 

J "'/=0 Jiy'=~f Jf'=0 Jv" = -f' 

E + /)dZ* (/')} E {dZ* {f)dZ{v" + /')} 

JO J-f Jo J-f 

S{v' + f- /', r)Si,." + f - /, /)Z?^,_|,| (Ai(i." - ly)) dv' df dv' df. 

We only expect to get contributions to the dual-frequency spectrum if j/' w — / + /' + fp*^^ and v" k, f — f ■\- for some k 
and L Furthermore the Dirichlet kernels will only contribute unless v' ^ v and v" v.lf v = Q then this is not a problem, and 
the ambiguity function is real. If ^ v^^^ for some k then the Dirichlet kernels will have arguments like ±(/ — /') +t'o'^ — v, 
and so if one contribution is large, the other is small. To bound contributions the integrals are done explicitly in the ambiguity 
domain, and it can be shown that for the empty points the contributions are o{N — \t\). For a strictly underspread process, it 
is possible to show they are O (log(A^)), see [27]. 

B. Correlation of EMAF for White Noise 
We find that the correlation for white noise is given with ^r{v) = E l^r(i^)} by (see Eqn ( |A^ ) 

^ri(!^l), Ar2(!/2)| +MTi(j'l)Mr2(j'2) = J ^ J J ^ 

Djv-|ri|(At(i^' - Ur))E{dZ{u' + f)dZ'{f)dZ'{u" + /2)d^(/2)} e-*'^('-'"-''2>^*('^-l"2l-^)e-""^^"^^' 
Dn.\^,_\(M{v" -V2)). (B-62) 

Thus due to the propriety of dZ{f) we have 

COY[Ar,{vi),Ar,{v2)}= ^{dZ{v' + f)dZ*{v" + f2)} 

E {dZ{f2)dZ*{f)} e*'^('^'-''i'^*(^-l^il-i)e*2^-^^i^*DAr_|^,l (At(z^' - i^i)) 

^_„(^"_^,)At(Ar-|r.|-l)g^»2^/2r.At^^_l^^l _ ^^-))_ (g_g3) 

Thus 

COY{A^,{u,),Ar,(l^2)\= / / / a'Siu'+f-u"-f2) 

^ ' -'-jit "ikt -'max(i'',0) imax(i/",0) 

Dn-\t2\{^'^W ~ ^2)) dfi df2dv'dv". 



2 At I 2 At ■> (' 2At I 2 At ■ 1 ' \ \+f -m i i 1^ 'nj- \* 

' ' ' ^i7r{i/ —h'i)At{N—\Ti\—l)^x27vfTxAt 
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This simplifies to 



•{X,(^l),X,(l/2)} 



237 fSSt /'■"'"(sSt ""''JSt ~^"'23t) 



2At 2At 



' max(i/' . /y" ,0) 



CT 0(1/ — V ) 



.4 i7r{i/'— i^i)At(iV— |ti|-1) i2ir/(Ti -T2 ) At 



(7 e 



pi27r/At(Ti-T2) 



i2-KAt{Tl - T2) 



™"(2St-'''.l/2) 



max(i^' ,0) 



We now implement a change of variables of v' = ^/[At{N — max(ri,T2))] and find with vi = ji/{At{N — max(Ti,T2))) as well as 
1^2 = 32/{At{N - max(Ti,T2))) : 



° sin(7r(5- ji))sin(7r(C-ji2)) 



i2,r/At(Ti-r2) -|mm(l/2At-,/i,l/2) 



i27r(Tl - T2)At 



,iTr(j2-jl) 



max(i/i ,0) 



= (Af — max(ri, T2))cr 
= (iV — max(Ti, T2))cr'' 

= (iV — max(ri,T2))cr* 



27r/At(Ti-T2) -| min(l/2At-max(i/i,i'2),l/2) 



i2n{Ti — T2) 



,<-n"(j2-il) 



exp(27rAt(l/2Af — mcix(i'i, i'2))i(ri — T2)) — exp(27rmax(i/i, z/2)«Af(Ti — T2)) 



i27r(ri — r2) 

^_l)Ji-32 exp(— 27rAt max(fi, !^2))4(ti — r2)) — exp(27rAt max(i'i, z/2)«(ti — r2)) 



i27r(Ti — T2) 



gix02-.-.)5.^_.^+O(l). 



This is zero depending on the frequencies of choice. If we fix 1^1 then we have to pick times that agree with that choice of 1^1 whilst if 
we fix n we can make the process independent without worrying about the value of n (apart from it being 0(1)), by just choosing the 
Fourier coefficients (this becomes more involved when r is no longer order one). Interestingly, we can pick the t values arbitrarily if we 
change the relative frequencies. 

C. EBAYES Calculations 

These calculations resemble strongly those of [31]. We first marginalize the likelihood over the non-observed Briy). We find that starting 
from Eqn. (69) that 



/(AWK),V) = // 

(1-p) 



(1 - p)e- 



ttV 



14"' (..)|- 



-<5(S.K)) + 



pe 



-e ^ 



This implies that our marginal likelihood for Qri^k) is in fact: 



/(QrK),V') 



(I - r,) ^ (QrC-^fc)) o 
2ii=^Q.K)e ^+2=^^Q.K)e 



2 
C 



Thus: 



— e V + = 

V V + aj. 



(C-64) 
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However if we just marginalize Eqn (69) over the phase then we have 

7rVcT£ 71" Jq 



V 

e 'l d'&r{v). 



We define the posterior ratio to be 



(V+<t2) 



e v+<7^ 



Thus we may write 



f {B^{v)\Q^{uk)) = 



(i-p) 

V 



?2 



(i-p) . 





2 






1 P 





V V+a;^ 

With this definition we find that: 

h(B\Qr(v)) 

Thus if we marginalize yet again over the angle then we find that: 

;Q coa({ + tf) 



(V + a^) ^ 1 



7rVa2 27r io 



27r |A-e|" _|B|^ 

e V e ^ 



/i(S;Q) = 







Q 
27r2 






Q 

27r2 


Vct2 




Q 

e 



^27r /'27r i2 , o2_ 



2ir i<2v+<l> o2 + o2 



Thus we see that using [44, 3.339]: 



/i(Q;Q) 



Jo 

-I 



Q^ + Q^_2QecOs({) 



e V d^. 



VA 



.2QQ 



2 ^ 
^e~'^^e~ :>>v QJq 
VA 



1 



.2|Q^ 



iV ( AQ, -AV 



We then have 



(C-65) 



(C-66) 



(C-67) 



(C-68) 



(C-69) 



r-S/VAV 

2 / ue 



■J 

Jo 



^ 7 Jo I -2*M^ 1 dw- 



(C-70) 



We can check with GradshtejTi et al. [44, 6.614] that this CDF integrates to one, and recognize this is a Rice distribution with parameters 
cr^ = |AV and n = XQ. When the mean becomes large compared to the standard deviation, e.g. when 



then there can be no danger in using such an approximation. 



^»1 
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D. Variance of Estimator 

To invert the ambiguity function into a local moment sequence we calculate 



fc = -(JV-l) 

JV-l JV-l 



fcl=-(JV-l) fc2 = -{JV-l) 
JV-l JV-l 



1 ^ ^ g«2.(.,^-.,,)t„Q^(^^^)Q*(^^^) 



fcl=-(JV-l) fc2 = -{JV-l) 



COV ^Ar{Vk^), Ar(j^fc2)| 

JV-l JV-l 

■{^^(*")} = 4AW E E c''^'^''^-''^^'*"cov{1.KJ,A.K2)}. (D-71) 



fel=-(JV-l) fc2 = -{JV-l) 



We see from appendix |B] that the correlation between frequencies for white noise with the same lag will become small if Vki and Uk^ are 
on the grid i^fc(r) = jfz:^. The effect of being slightly off that grid (for small r) is negligible. Therefore if we consider this case 



^ JV/2-1 

var{Mr'(tn)} ^ e?(/.,)var{A.(^.2.j} 

fc = -(JV/2-l) 

1 

- r^^' (dl{v)Y^x[AAv)} du. 



T 

As 0r(i') = with probability p, and /o ~ the variance will decrease with for white noise 



2 



